Separating Populations with Wide Data: A Spectral Analysis
نویسندگان
چکیده
In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality γ, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size—the product of number of data points n and the number of features K—needed to correctly perform this partitioning as a function of 1/γ for K > n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
منابع مشابه
Genome Wide Meta-QTL Analysis to Identify Stable Genomic Regions Associated with Important Quantitative Traits (QTLs) in Various Peach (Prunus persica L.) Populations
This article has no abstract.
متن کاملEfficiency Analysis Based on Separating Hyperplanes for Improving Discrimination among DMUs
Data envelopment analysis (DEA) is a non-parametric method for evaluating the relative technical efficiency for each member of a set of peer decision making units (DMUs) with multiple inputs and multiple outputs. The original DEA models use positive input and output variables that are measured on a ratio scale, but these models do not apply to the variables in which interval scale data can appe...
متن کاملارائۀ سادهترین نسبتهای طیفی بهمنظور تشخیص برخی خصوصیات شیمیایی خاک در مناطق خشک با استفاده از تکنیک دورسنجی (مطالعۀ موردی: کویر درۀ انجیر بافق)
Introduction Understanding the spectral reflectance of different soils and other surface elements forms the basis for analyzing the process of interpreting remote sensing data. Spectral properties of the various phenomena of the Earth's surface are not constant and are changing, based on the complex time and space conditions. Determination of soil chemical properties using remote sensing techni...
متن کاملA New Approach Based on Erythrocyte Indices to Exclude Normal Populations from Chromatography Based Thalassemia Screening Programs with Very High Fidelity
Background: Screening and counselling is the most effective way to prevent the birth of children with thalassemia major. An accurate and relatively less time-consuming protocol is necessary to screen large populations. Separating iron deficiency anaemia from thalassemia trait based on blood cell parameters has been used by hematologists for many years. We aimed to design a new approach to scree...
متن کاملمقایسه روشهای طبقهبندی ماشین بردار پشتیبان و شبکه عصبی مصنوعی در استخراج کاربریهای اراضی از تصاویر ماهوارهای لندست TM
Land use classification and mapping mostly use remotely sensed data. During the past decades, several advanced classification methods such as neural network and support vector machine (SVM) have been developed. In the present study, Landsat TM images with 30m spatial resolution were used to classify land uses through two classification methods including support vector machine and neural network...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007